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Abstract 

Purpose The impact assessment of chemical compounds in 
Life Cycle Impact Assessment (LCIA) and Environmental 
Risk Assessment (ERA) requires a vast amount of data on 
the properties of the chemical compounds being assessed. 
The purpose of the present study is to explore statistical 
options for reduction of the data demand associated with 
characterisation of chemical emissions in LCIA and ERA. 
Materials and methods Based on a USEtox™ character¬ 
isation factor set consisting of 3,073 data records, multi¬ 
dimensional bilinear models for emission compartment 
specific fate characterisation of chemical emissions were 
derived by application of Partial Least Squares Regression. 
Two sets of meta-models were derived having 63% and 
75% of the minimum data demand of the full USEtox™ 
characterisation model. The meta-models were derived by 
grouping the dependent variables, the fate factors obtained 
from the USEtox™ data set and then selecting the 
independent chemical input parameters from the minimum 
data set, needed for characterisation in USEtox™, accord¬ 
ing to general availability, importance and relevance for fate 
factor prediction. 
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Results and discussion Each approach (63% and 75% of 
the minimum data set needed for characterisation in 
USEtox™) yielded 66 meta-models. In general, good 
correlation was obtained between the observed fate factors 
(those fate factors included in the USEtox™ data set) and 
the predicted fate factors (those fate factors obtained by the 
meta-models), and the validation regression coefficients 
were all in the range (R—0.41-0.96). The lower end of the 
regression coefficient range represents those few emission 
scenarios were the selected independent variables did not 
contain appropriate information. Hence, most meta-models 
yielded fate factors in good correlation with the observed 
fate factors and yielded correlation coefficients in the 
higher end of the range during validation. In general, the 
more data-demanding approach yielded the largest regres¬ 
sion coefficients. 

Conclusions The applied statistical approach illustrates that it 
is possible to derive meta-models from full fate and exposure 
models and that it is also possible to tailor the data demand of 
these meta-models according to various data and emission 
preferences. The results obtained in the study reveal that not all 
emission scenarios included in USEtox™ are exploiting the 
minimum data set equally and the minimum data set may thus 
in many cases contain underused data. 

Keywords Approximated fate modelling • Fate modelling • 
Model approximation • Simplified characterisation • 
Simplified fate modelling • Simplified impact assessment • 
Underused fate parameters • USEtox 

1 Introduction 

Assessment of toxic releases in environmental risk assess¬ 
ment (ERA) and Life Cycle Impact Assessment (LCIA) 
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often proceeds with multimedia fate and exposure models 
attached to models of dose-response relationships. The 
applicability of such models is, however obstructed by the 
fact that the environmental processes included in these 
models are complicated non-linear functions of a large 
number of parameters. Some of these are environmental 
parameters (such as the soil composition and the temper¬ 
ature), and other parameters are substance-specific (like the 
atmospheric degradation rate and the octanol-water parti¬ 
tioning coefficient). Especially substance-specific data are 
often hard to get, in particular for any of the thousands of 
lesser known and lesser common chemicals. This problem 
shows up in a marked way in LCIA, where the releases of 
hundreds or even thousands of toxic chemicals throughout 
a product life cycle (from mining to final disposal) are 
aggregated into a few impact categories, such as human 
toxicity and aquatic ecotoxicity. It is therefore attractive to 
seek ways to circumvent the use of these models, whilst 
keeping a close correspondence with the results of such 
models. In fact, it would be appealing if one could construct 
a model of these models, i.e. a simplified representation of 
the true model structure. In this paper, we will refer to this 
“simplified model of a model” as a “meta-model”, and use 
the term “model” for the original thing which is supposed 
to be reflected in the meta-model. 

In normal model construction, a model is supposed to be 
inspired by and validated against experimental results. In 
the meta-model construction that we are seeking to 
describe, validation should take place against the results 
of the original model. In other words, we suppose that the 
original model is available and that it has been validated 
appropriately for a suitable number of situations. The 
purpose of the meta-model is then to omit the use of the 
original model, and to use the meta-model whenever the 
data requirements of the original model cannot be met. 

In ERA, an important class of meta-models is quantita¬ 
tive structure-activity relationships (QSARs) (see, e.g. 
Jensen 2006). QSARs are available to predict or estimate 
quantitative properties of chemicals on the basis of other 
quantitative properties of that chemical. In QSAR theory, 
the basic assumption is that many substance-specific 
parameters are correlated, at least for specific groups of 
substances. Estimation of missing parameters can therefore 
in principle take place on the basis of established QSARs. 
Although a data-demanding multimedia fate and exposure 
model might be run on the basis of a large number of 
QSAR estimates, the fact that so many estimates are 
combined into one overall model result leads to adopt a 
different approach. Our approach in this paper is based on 
the fate factors obtained from a newer and acknowledged 
multimedia fate and exposure model, in this case USEtox™ 
(Rosenbaum et al. 2008), and these results are statistically 
connected with a subset of selected input parameters that 


are needed for the USEtox™ model. The exact choice of 
the set of input parameters is made by scanning databases 
for availability of the individual parameters, combined with 
statistical information. The meta-model relationships them¬ 
selves are established on a purely statistical basis. The 
nonual recommendations (Cronin and Schultz (2003)) for 
calibration and validation of QSAR theory are followed in 
the approach presented. 

2 Methods 

The development of the meta-model proceeds through the 
following steps: 

1. Grouping of dependent variables that can be modelled 
together 

2. Division of the data set in calibration and validation set 

3. Selection of data transformation and scaling 

4. Calibration of the model 

5. Location of appropriate X variables 

(a) Optimisation/trimming of model by deselection of 
insignificant X variables, recalibration and calcu¬ 
lation of linear calibration coefficients 

6. Validation of meta-model 

The exact procedure on how to interpret the results from 
the individual steps is presented by Wold et al. (2001). 

This section discusses the following elements in more 
detail: 

• The multimedia fate and exposure model that served as a 
basis for the derivation of the meta-model (i.e. USEtox™) 

• The data set that was used to create a calibration and 
validation set 

• The statistical techniques that were used to derive the 
meta-model 

2.1 The multimedia fate and exposure model 

A central element in LCIA is the characterisation step (ISO 
2006), where the calculation of the category indicator 
results takes place, in most cases on the basis of character¬ 
isation factors. As a general rule, the emitted amount of a 
certain chemical is multiplied by the characterisation factor 
that connects that chemical to a certain impact category, 
after which an aggregation across chemicals within one 
impact category is perfonned: 

OR, = ^ C¥j,m, (1) 

j 

where m, is the amount of a certain chemical released to 
compartment i, CFis the characterisation factor that 
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connects the chemical in compartment i to impact category 
(and thus the compartment in/by which the effect is 
manifested ) j, and CIR, is the category indicator result 
for impact category j. The characterisation factors encap¬ 
sulate the information on the fate, exposure and effect of 
the chemical; it is derived from the characterisation model. 

In the toxicity-related impact categories, the character¬ 
isation factor is built with a number of separate elements 
(see also Rosenbaum et al. 2008): 

• The aspect of fate, symbolised by the fate factor (FF); 

• The aspect of exposure or intake, symbolised by the 
exposure factor (XF); 

• The aspect of effect, symbolised by the effect factor 
(EF); and 

• The combined aspect of fate and human exposure, 
symbolized the intake fraction (iF). 

The aspect of exposure covers several intake routes for 
humans, such as inhalation of air and ingestion through 
crops, fish and dairy. For ecotoxicity, the aspect of intake is 
left out, and the impacts are defined right in the receiving 
compartment itself. In this way, two structures of the 
characterisation factor can be discerned: 

CF /( = FF/XF./FF,, (2) 

for ecosystems, with j = freshwater and i = emission 
compartment for the impact category freshwater aquatic 
ecotoxicity, j = soil for the impact category terrestrial 
ecotoxicity, etc., and 

cFji = = EV > iiFii ( 3 ) 

Ik Ik 

for man, where j = man for the impact category human 
toxicity, and where k represents receiving compartments 
(air, freshwater, soil, etc.) and / represents intake routes (air, 
crops, fish, dairy, etc.). 

As is evident from both characterisation approaches, the 
fate factor plays a central role in the characterisation of both 
ecotoxicological and human toxicological impact potentials. 

The effect factors are determined from measured or 
estimated (usually by QSAR) effect measures, the fate 
factors all have to be calculated by multimedia fate 
modelling. The way the fate factors are modelled thus 
determines the fate-related data demand of a chemical 
characterisation in LCA and thus plays a crucial role in the 
data demand related to characterisation of chemical 
emissions in LCA. The exposure factors could be 
approached similar to the fate factors, but in this study, 
we focus solely on simplification of the fate factor 
modelling. To obtain a complete simplified methodology, 
similar simplified meta-models for the exposure factors are 
needed. Effect factors on the other hand can be obtained via 
alternative existing sources like US EPA (2009). 


2.2 The data set 

As part of the USEtox™ documentation, Huijbregts et al. 
(2010) has made available the substance-specific data for 
3,073 organic chemicals along with the model results 
including fate factors. This data set served as a basis for 
the statistical model derivations for this paper. 

In general, the fate factors on organic chemicals depend 
on a large number of physico-chemical properties, such as 
molecular weight, water solubility, octanol-water partition 
coefficient and compartment specific degradation rates. For 
the physico-chemical data, some parameters are widely 
available (such as the molecular weight and water solubility) 
while other parameters (such as compartment specific 
degradation rates) are available for only a few substances. 
Generally and not surprisingly, the pattern is that the cheaper it 
is to measure a property, the more likely it is that the property 
can be located for a given chemical. 

All of the 3,073 organic chemicals in the above described 
dataset were selected for inclusion in our study and the fate 
factors on all 3,073 chemicals for emissions to continental 
urban air, continental rural air, continental freshwater, 
continental seawater, continental natural soil and continental 
agricultural soil were compiled, in total six emission compart¬ 
ments. The USEtox™ compound specific fate factors consist 
of 66 parameters for each chemical (11 fate factors for each 
emission compartment), of which all are needed to perform a 
full characterisation (i.e. calculation of characterisation factors 
for emission to all possible emission compartments and final 
compartments) in USEtox™. The descriptive statistics of the 
dependent and independent variables used as basis for the 
development of the meta-model are presented in Tables 1 
and 2 in Appendix A. 

As the grouping of the dependent variables can be done in 
various ways, we decided to let data availability determine the 
grouping of the variables and thereby optimize the applica¬ 
bility of the meta-model data wise. In this way, two meta¬ 
model derivation approaches were isolated. Basically, we 
decided to derive 66 meta-models (one for each final 
compartment) grouped according to emission/receiving com¬ 
partment (i.e. six meta-model groups) based on five or six 
(approach 1 and 2) of the eight parameters included in the 
USEtox™ minimum dataset, equal to app. 63% and 75% of 
the original model minimum data demand. The six selected 
parameters were molecular weight, octanol-water partition 
coefficient, vapour pressure at 25°C, water solubility at 25°C, 
degradation rate in air and/or (approach 1/2) degradation rate 
in water. In the first approach which relies on five parameters, 
the degradation rate of the two selected degradation rates to be 
included was determined based on meta-model group 
importance of the two degradation rates, in such a way that 
only the degradation rate having the highest model importance 
was included. 
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Since linear and bilinear models assume linear data 
relationships, and since the variables selected for the model 
derivation are of physical, chemical and/or biological 
origin, and hence have a tendency to display skewed 
distributions, the variables have to be transfonned in such a 
way that dependent variables can be assumed to be a linear 
function of the independent variables. Fate factors can 
easily span ten orders of magnitude (see Table 2 in 
Appendix A) or more due to the large differences in 
properties of the chemicals, and such large differences in 
scale can create a biased picture in the least squares fit of 
regression analysis. To raise the probability of linear 
dependency between the dependent and independent 
variables, these were logarithmically transformed. We used 
the 10 Log, abbreviated as Log. Using logarithmic transfor¬ 
mation further reduces the scale differences to a much 
narrower range, with a much lower number of possibly 
influential data points. An additional advantage of the 
logarithmic transformation is that the differences in scale of 
the fate factors are reduced. After all, one should realise 
that fate factors typically are specified per kilogram 
emission of chemical compound. 

The results of the statistical estimation techniques (see 
below) are scale dependent, i.e. they depend on the 
magnitude of the variables. To avoid giving certain 
independent parameters, more weight than other due to 
their magnitude, all parameters are scaled to unit variance, 
thereby making the results independent of the units of the 
variables. All parameters are transfonned to unit variance 
by division with the standard deviation around the mean of 
the variables. 

2.3 The statistical techniques 

The statistical technique used to derive the meta-models is 
related to two different activities: 

• Estimation of the most appropriate model and its 
coefficients through regression analysis; and 

• Validation of the predictive capability of the model by 
methods taken from QSAR-practice. 

Below, both aspects are discussed. 

The derivation of the meta-model from the model results 
takes place on the basis of regression analysis. This widely 
used statistical tool for identifying multilinear relationships 
between several independent variables (say, x\. x 2 , ...) and 
one dependent variable (say, y) is assumed to be known to 
the reader. The general form of such a model is: 

y = a 0 + a\X\ + a 2 x 2 + ... (4) 

where aO, a 1 , a2, ... are constants that are to be estimated. 
The nature of the problem dealt with in this paper forces us 


to reconsider one assumption of the ordinary least-squares 
regression model: the independence of the independent 
variables. In the present context, the set of independent 
variables is formed by parameters such as the molecular 
weight, the vapour pressure, the solubility, the octanol- 
water partition coefficient and the degradation rate in water. 
As is known from QSAR theory, there are approximate 
relationships between some of these variables. Lienee, the 
assumption of independence in an actual data set will be 
violated to some extent. 

Alternatives to the classical regression model have been 
developed, partly with the purpose of obviating the 
assumption of independence. These variants are known 
under names as principal component regression (see 
Esbensen 2000; Vigneau et al. 1997), ridge regression 
(see Vigneau et al. 1997) and partial least-squares regres¬ 
sion (PLSR) (see Esbensen 2000). In the sequel of this 
paper, the PLSR method has been employed for the 
following additional reasons: 

• It allows for the modelling of dependent, noisy 
variables (even with missing observations) 

• It allows for the modelling of multiple dependent 
variables (say, y u y 2 , ...) 

• It is available in software (e.g. S1MCA-P+ Umetrics 
(2009)) that allows for the identification of the optimal 
set of independent variables, as well as the optimal 
transformations of these variables 

Please refer to Appendix B for a short introduction to 
PLSR theory. For a more detailed introduction to the PLSR 
theory, please refer to Martens and Dardenne (1998), 
Martens and Martens (2001), Esbensen (2000) and Wold 
et al. (2001). 

Apart from selecting the best model and calibrating the 
coefficients of the model, one should determine the quality 
of the predictive capability of the model. Regression 
models of the type derived in this paper are comparable to 
the models typically encountered in QSAR. In this study, 
we calculated R 1 , the coefficient of determination, which 
can be interpreted as the fraction of variance that is 
explained by the model, Q 2 (cum) which is the cumulative 
fraction of the total variation of x or y that can be predicted 
by the components (estimated by cross-validation) and the 
root mean square error of prediction, which is the standard 
deviation of the predicted residuals or more precisely errors 
and is calculated as presented in Appendix C. 

Common to all QSAR techniques are the non-standardised 
recommendations for the development and use of QSARs. All 
of these recommendations are however aimed at QSARs 
developed to estimate structural-biological and/or structural- 
chemical activity relationships such as biodegradation, toxic¬ 
ity and physical chemical properties. What the models 
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presented in this paper are designed to do is to estimate fate 
factors to facilitate the calculation of characterisation factors 
in LCA. In this way, the meta-models do not model structure- 
activity relations but a chemical property-model relation. In 
general, there are no formal guidelines for developing QSARs 
(Cronin and Schultz 2003) but politically determined 
recommendations on common QSARs have been presented 
(see, e.g. European Chemical Bureau 2003). It is unclear 
whether recommendations like these apply to the type of 
models presented here. 

A crucial point in QSAR and general model development 
is the approach taken for validating the model. Based on the 
aspects presented in Appendix C, it was concluded that test 
set validation would give the best estimate of the prediction 
performance of models of the type developed here based on 
the data set summarised in Tables 1 and 2 in Appendix A. In 
our study, 616 records (20%) of the data set (3,073 records 
in total) were randomly selected and used as validation set. 


3 Results 

As presented in Tables 1 and 2, 2><66 emission and effect 
compartment specific meta-models were derived applying 
both the 63% and 75% data demand approach. All 112 
validation plots for all possible combinations of included 
emission compartments and effect compartments are pre¬ 
sented in Appendices D (approach 1-63% data demand) 
and E (approach 2-75% data demand). 

Based on the meta-model coefficients obtained from 
both approaches (see Appendix F and G), it is possible to 
construct the approximated multiple linear models for all 
combinations of emission and effect compartment. We, do 
however recommend to use the exact meta-models (please 
refer to Appendix H for the complete SIMCA-P + files). 
Below, example illustrates how the approximated factor for 
emission and effect in urban air is constructed according to 
the approach 1 results (see Table 1 in Appendix F). 


Log(FF airU , alrU ) = -2.42 - 6.55 x KT 1 x Log(Mw) + 1.19 x 1(T 2 x Log(Kow) + 1.29 x 1(T 4 x Log(Pvap25) 

+ 7.79 x 1(T 3 x Log(Sol25) - 5.58 x KT 1 x Log(kdegA) + 1.68 x KT 1 x Log(Mw) 2 
-2.19 x 1(T 4 x Log(Kow) 2 + 1.73 x 1(T 4 x Log(Pvap25) 2 + 8.60 x 1(T 5 x Log(Sol25) 2 

— 5.83 x 1CT 2 x Log(kdegA) 2 + 4.76 x 1CT 3 x Log(Mw) x Log(Kow) — 7.26 x lCT 3 Log(Mw) x Log(Pvap25) 

+ 8.38 x 1(T 4 x Log(Mw) x Log(Sol25) — 2.57 x 1CT 2 x Log(Mw) x Log(kdegA) , 

+ 1.72 x 1CT 4 x Log(Kow) x Log(Pvap25) — 7.66 x 10~ 5 x Log(Kow) x Log(Sol25) 

+ 3.63 x 1CT 3 x Log(Kow) x Log(kdegA) — 5.01 x KT 4 x Log(Pvap25) x Log(Sol25) 

— 2.84 x 1CT 3 x Log(Pvap25) x Log(kdegA) +2.02 x 1CT 3 x Log(Sol25) x Log(kdegA) 

— 1.51 x 1(T 2 x Log(Mw) 3 + 9.66 x 1(T 6 x Log(Kow) 3 + 6.74 x 1(T 6 x Log(Pvap25) 3 
+ 3.96 x 1(T 6 x Log(Sol25) 3 - 2.07 x 1(T 3 x Log(kdegA) 3 


Where: FF = fate factor, airU = urban air (s/m 3 ), Mw = 
molecular weight (g/mol), Kow = octanol-water partition 
coefficient (unit less), Pvap25 = vapour pressure at 25°C 
(Pa), Sol25 = water solubility at 25°C (mg/L), kdegA = 
degradation rate in air (1/s), kdegW = degradation rate in 
water (1/s). Notice that all numbers representing the 
coefficients a 0 , fli, etc. have a dimension. Figure 1 shows 
an example validation plot of the observed and the 
predicted values of the fate factor for emission and effect 
in urban air with approach 1. 

4 Discussion 

The summarised regression statistics on the 2x66 derived 
fate factor meta-models presented in Tables 1 and 2, 
indicates that a large amount of the observed variance in 
the fate factor can be explained by the PLSR derived linear 
models. As presented in Tables 1 and 2, it seems plausible 
to assume that the most important fate pathway is the 


compartment-specific degradation process in the emission 
compartment (e.g. degradation in air by emission to air). 

In general, it is possible to explain a large part of the 
variance in the fate factors of the 11 effect compartments 
modelled in USEtox™ by the PLSR-derived linear models 
summarised in Tables 1 and 2. In the air emission scenarios 
(emission to continental urban or continental rural air), 
approach 1 achieves reasonable regression coefficients 
(R 2 =0.61-0.86) while the slightly more data-demanding 
(app. data demand increase = 12%) approach 2 yields 
reasonable, however noticeable improved regression 
coefficients (R 2 =0.71-0.87). Highest observed correlation 
between observed and predicted fate factors by emission to 
air compartments are observed for air effect compartments 
(exposure via urban air, continental air and global air). In 
the water emission scenarios (emission to continental 
fresh water and continental sea water), approach 1 achieves 
good regression coefficients (A 2 =0.62—0.93) while the 
slightly more data-demanding approach 2 yields even better 
regression coefficients (R 2 =0.78-0.95). Highest observed 
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Table 2 Validation results of the validation of the 66 meta-models obtained according to model approach 2 (making use of 75 % of the minimum data set in USEtox™) grouped according to 
emission compartment 
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Fig. 1 Validation plot ( 77 =616) for the meta-model capable of 
predicting fate factors for characterization of effects in urban air by 
emission to continental seawater. This model was obtained by 
application of model derivation approach 1, applying 63% of the 
minimum data set for USEtox™. FFobs. observed fate factors [obtained 
from the USEtox™ data set (Huijbregts et al. 2010)] and FFpred. 
predicted fate factors (obtained by application of the meta-model) 

correlation between observed and predicted fate factors by 
emission to water compartments are observed for air effect 
compartments (exposure via urban air, continental air and 
global air). We conjecture that this reflects the much simpler 
role of the air compartments in fate models. In the soil 
emission scenarios (emission to continental natural soil and 
continental agricultural soil), approach 1 achieves poor to good 
regression coefficients (K 2 =0.41-0.94) while the slightly 
more data-demanding approach 2 yields improved regression 
coefficients in the range (i?—0.56—0.90). Highest observed 
correlation between observed and predicted fate factors by 
emission to soil compartments are again observed for air 
effect compartments (urban air, continental air and global air), 
probably for the same reason as conjectured above. 

The purpose of the linear models is to mimic the way 
USEtox™ treats certain combinations of independent 
variables, and whether the linear models have been derived 
from fate factors calculated from real independent data 
(measured compounds specific input) or from estimated 
data sets will most likely not influence the parameterisation 
of the derived linear meta-models significantly. The lack of 
importance of data origin is caused by the fact that 
USEtox™ as any other model treats estimated and 
measured data sets the same way. In the present case, we did 
not use soil and sediment degradation rates, both parameters 
are complicated to estimate precisely and even more 
importantly, measured degradation rates for these two 
compartments are belonging to the more exclusive group of 
laboratory fate data, only available on chemicals with known 
problematic properties and used in considerable quantities. 

The presented meta-models have all been derived from 
data on a broad group of organic chemicals. It may well be 
that deriving a separate meta-model for, say, chlorinated 
hydrocarbons, yields an even better model performance. 
This applies in particular to metals and other speciating 


inorganic chemicals, which are in many respects different 
from organic chemicals. 

5 Conclusions 

As presented, it has been demonstrated that it is possible to 
explain large amounts of the variance observed in fate 
factors obtained from USEtox™ characterisation by simple 
linear models of the same type as presented in Eq. 5. 
Deriving simple models from complex models potentially 
opens for the creation of model compilations, suitable for 
specific data availability combinations or characterisation 
of specific emission situations. Statistical derivation assures 
a significant correspondence between full and meta-models 
ensuring compatibility and aggregation potential of the 
results. 

An important aspect of deriving the presented type of 
meta-models is that the relevance of the individual indepen¬ 
dent data is illuminated. The full S1MPCA-P + 12.0 model 
(app. 50 Mb) is available from the corresponding author 
upon request. 
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